Assignment: VAST Mini-Challenge 2

Assignment: VAST Mini-Challenge 2

Yong Kai Lim https://limyongkai.netlify.app/ (Singapore Management University)
07-17-2021

1. Overview

In the roughly twenty years that Tethys-based GAStech has been operating a natural gas production site in the island country of Kronos, it has produced remarkable profits and developed strong relationships with the government of Kronos. However, GAStech has not been as successful in demonstrating environmental stewardship.

In January, 2014, the leaders of GAStech are celebrating their new-found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.

2. Objectives

Both historical vehicle tracking data and transaction data from loyalty and credit card will be used to observe the following issues:

  1. The most popular locations and when they are popular
  2. Infer the owner of each credit card and loyalty card
  3. Identify potential informal or unofficial relationships among GASTech personnel
  4. Analyze suspicious activity of the missing personnel prior to the disappearance

3. Data Sources

The data source are available publicly on VAST Challenge 2021 website under the sub section Mini-Challenge 2. The data used for the project are as follows:

Map of Abila, Kronos

Figure 1: Map of Abila, Kronos

LastName FirstName BirthDate BirthCountry Gender
Bramar Mat 1981-12-19 Tethys Male
Ribera Anda 1975-11-17 Tethys Female
Pantanal Rachel 1984-08-22 Tethys Female
Lagos Linda 1980-01-26 Tethys Female
Mies Haber Ruscella 1964-04-26 Kronos Female
Forluniau Carla 1981-06-02 Kronos Female
LastName FirstName CarID CurrentEmploymentType CurrentEmploymentTitle
Calixto Nils 1 Information Technology IT Helpdesk
Azada Lars 2 Engineering Engineer
Balas Felix 3 Engineering Engineer
Barranco Ingrid 4 Executive SVP/CFO
Baza Isak 5 Information Technology IT Technician
Bergen Linnea 6 Information Technology IT Group Manager
timestamp location price last4ccnum
1/6/2014 7:28 Brew’ve Been Served 11.34 4795
1/6/2014 7:34 Hallowed Grounds 52.22 7108
1/6/2014 7:35 Brew’ve Been Served 8.33 6816
1/6/2014 7:36 Hallowed Grounds 16.72 9617
1/6/2014 7:37 Brew’ve Been Served 4.24 7384
1/6/2014 7:38 Brew’ve Been Served 4.17 5368
Timestamp id lat long
01/06/2014 06:28:01 35 36.07623 24.87469
01/06/2014 06:28:01 35 36.07622 24.87460
01/06/2014 06:28:03 35 36.07621 24.87444
01/06/2014 06:28:05 35 36.07622 24.87425
01/06/2014 06:28:06 35 36.07621 24.87417
01/06/2014 06:28:07 35 36.07619 24.87406
timestamp location price loyaltynum
01/06/2014 Brew’ve Been Served 4.17 L2247
01/06/2014 Brew’ve Been Served 9.60 L9406
01/06/2014 Hallowed Grounds 16.53 L8328
01/06/2014 Coffee Shack 11.51 L6417
01/06/2014 Hallowed Grounds 12.93 L1107
01/06/2014 Brew’ve Been Served 4.27 L4034

4. Literature Review

4.1 Past MITB Visual Analytics project were reviewed and evaluated prior to the assignment.

4.2 The solutions submitted for VAST challenge 2014 were also reviewed on their repository webpage(“VAST Challenge 2014:MC2 - Patterns of Life Analysis” 2014).

5. Tasks and Questions:

1. Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies?

The following packages are loaded for data preparation and visualisation.

packages = c('tidyverse', 'lubridate', 'hms', 'MASS',
             'ggplot2', 'cdparcoord', 'ggiraph', 'plotly', 
             'geosphere', 'sf','rgeos', 'crosstalk',
             'raster', 'tmap','visNetwork','ggraph','tidygraph',
             'ggalluvial')

for(p in packages){
  if(!require(p, character.only=T)){
    install.packages(p)
  }
  library(p, character.only=T)
}

The credit card and loyalty card datasets were loaded and the structure was checked.

glimpse(cc)
Rows: 1,490
Columns: 4
$ timestamp  <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty)
Rows: 1,392
Columns: 4
$ timestamp  <chr> "01/06/2014", "01/06/2014", "01/06/2014", "01/06/~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

Customer would usually use credit card (cc) with their loyalty card, hence joining both data allows the tagging of cc to loyalty card number. A suitable left join on CC data with loyalty data using timestamp, location and price will be performed. However, both timestamp field are in character format instead of datetime format. The following adjustment will be performed:

## 1. Create column "datetime" in datetime format "YYYY-dd-mm HH:MM:SS"
## 2. Create column "date" in date format "YYYY-dd-mm"
## 3. Change encoding of locations name
cc <- as_tibble(lapply(cc, iconv, to="ASCII//TRANSLIT"))
cc <- cc %>% mutate(datetime = mdy_hm(timestamp), date = date(datetime),
                    price = as.numeric(price), last4ccnum=as.factor(last4ccnum)) 

## 1. Create column "date" in date format "YYYY-dd-mm"
## 2. Change encoding of locations name
loyalty <- as_tibble(lapply(loyalty, iconv, to="ASCII//TRANSLIT"))
loyalty <- loyalty %>% mutate(date = date(mdy(timestamp)), price=as.numeric(price))

glimpse(cc)
Rows: 1,490
Columns: 6
$ timestamp  <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <fct> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
$ datetime   <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ date       <date> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
glimpse(loyalty)
Rows: 1,392
Columns: 5
$ timestamp  <chr> "01/06/2014", "01/06/2014", "01/06/2014", "01/06/~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~
$ date       <date> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~

Prior to joining both data, a quick glance of the aggregated summary statistics in table 1 showed that there are more credit card transaction as compared to loyalty card transaction for each day. This could imply that employees did not use their loyalty card when they perform a transaction with their credit card and a perfect join of the two dataset was not possible. A left join of cc and loyalty dataset by location, date and price was performed.

## Summary statistics for cc and loyalty transaction per day
cc_t<-merge((cc %>% group_by(date) %>% summarize(cc_count = n())), 
      (loyalty %>% group_by(date) %>% summarize(loyalty_count = n())), 
      by="date") %>% mutate(diff = cc_count-loyalty_count)
knitr::kable(cc_t, "simple",
             caption="Summary statistics for cc and loyalty transaction per day")
Table 1: Summary statistics for cc and loyalty transaction per day
date cc_count loyalty_count diff
2014-01-06 128 119 9
2014-01-07 130 122 8
2014-01-08 129 122 7
2014-01-09 133 118 15
2014-01-10 116 103 13
2014-01-11 61 51 10
2014-01-12 55 54 1
2014-01-13 121 117 4
2014-01-14 128 123 5
2014-01-15 126 122 4
2014-01-16 131 123 8
2014-01-17 113 108 5
2014-01-18 70 67 3
2014-01-19 49 43 6
## Left join cc with loyalty data
trans <- left_join(cc, loyalty, by=c("location", "date", "price")) %>%
  dplyr::select(-c(timestamp.x, timestamp.y, datetime))
glimpse(trans)
Rows: 1,496
Columns: 5
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <fct> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
$ date       <date> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ loyaltynum <chr> "L8566", NA, "L8148", "L5553", "L3800", "L2247", ~

The trans data mostly tagged a unique “last4ccnum” to a unique “loyaltynum.” However, the number of rows increase from 1490 to 1496, implying that multiple matches occur. It is most likely because there were 6 transaction in the loyalty data with the same location, date and price value from different loyaltynum card.

To investigate the multiple tagging of each unique cc number or unique loyalty card number, the data was transformed and visualise using an interactive parallel coordinate graph in Figure 2. Clicking on either vertical axis “last4ccnum” or “loyaltynum” highlights only the matching lines.

bind_rows(
  trans %>% na.omit() %>% 
    group_by(last4ccnum)%>% filter(n_distinct(loyaltynum)>1),
  trans %>% na.omit() %>%
    group_by(loyaltynum) %>% filter(n_distinct(last4ccnum)>1)
) %>% distinct() %>% mutate(last4ccnum = as.character(last4ccnum)) %>%
  dplyr::select(last4ccnum,loyaltynum) %>%
  discparcoord(k=1000, 
               interactive=TRUE, 
               name="Multiple tags of CC and loyalty number")

Figure 2: Parallel Coordinate plot of CC with multiple tags to Loyalty card number

Selecting credit card number ending 8332, 7889, 5921, 5368, 4948 and 4795 revealed that those credit card were tagged to two different unique loyalty card number and one of them has low transaction count which was represented by the dark brown line. Drilling down on the 6 credit card numbers in the trans data, the matching row had only 1 transactions. This imply that there were two loyalty card transactions that recorded the same date, location and price, resulting in a one to many join that fulfilled all conditions. Hence, these 6 rows of transaction were the difference in row count from the original cc data and the trans data.

Credit card number 1286 was tagged to loyalty number L3288 and L3572 with 15 and 13 transactions respectively. On the other hand, loyalty number L3288 is also tagged to a unique cc number 9241 with 13 transactions. A possible deduction would be the owner of cc 9241 loyalty card is L3288 and owner of cc 1286 loyalty card is L3572. However, the owner of cc 1286 often paid and use L3288 loyalty card. This could suggest close relationship between owners of cc 1286 and 9241.

Loyalty number L6267 wass tagged to cc number 6899 and 6691 with 23 and 20 transaction respectively. On the other hand, both cc 6899 and 6691 had only one unique tag to the loyalty card. Possible deduction could be that the owner of credit card number 6899 and 6691 is the same person using loyalty card L6267. Another deduction would be loyalty number L6267 is shared among the owners of cc 6899 and 6691. If the latter deduction is correct, this could suggest close relationship between owners of cc 6899 and 6691.

With these information, a new dataset card_tag was created to tag the owners of their cc and loyalty card numbers together. However, there were 409 transactions in dataset trans that were not tagged.

## Tag owners of credit card to loyalty card number
card_tag <- trans %>% 
  na.omit() %>%
  group_by(last4ccnum, loyaltynum )%>%
  summarize(count_d = n()) %>%
  filter(count_d > 1) %>%
  filter(!(last4ccnum == 1286 & loyaltynum =="L3288")) %>%
  dplyr::select(-(count_d))

The 409 cc transactions that were not tagged were analysed by mapping the cc and loyalty card. Thereafter, a left join of non-tagged transactions to the loyalty data by field “date,” “location” and loyaltynum" was performed. From Figure 3, it was observed that most of the difference in cc card price and loyalty price converges to “20”, “40”, “60” and “80” dollars. A possible deduction based on the price difference in denomination of “20” could suggest some form of discount or rebate mentioned in the background.

A deliberate shortfall is not possible as those transactions were evenly spread across different days and locations. Furthermore, as the occurrence of the difference in price exist for multiple cc and loyalty card, it was not possible that the shortfall were targeted towards specific owners or at specific locations.

## Non matching cc and loyalty card transaction
non_match_cc <- anti_join(cc, (trans %>% na.omit())) %>% left_join(card_tag)
## Non matching loyalty card and cc transaction
non_match_loy <- anti_join(loyalty, (trans%>%na.omit()))
## All non matching transaction
non_match_trans <- left_join(non_match_cc, 
                             non_match_loy, 
                             by=c("location", "date", "loyaltynum" )) %>% 
  na.omit() %>% 
  mutate(diff=price.x-price.y) %>% 
  filter(diff>=0)

## Remove outliers, select columns and visualise using parallel coordinate plot
non_match_trans %>% 
  filter(!(diff==boxplot(non_match_trans$diff, plot = FALSE)$out)) %>% 
  dplyr::select(last4ccnum,loyaltynum,location,price.x,price.y, diff) %>%
  rename(price_cc = price.x, price_loyalty = price.y) %>%
  mutate(last4ccnum = as.character(last4ccnum)) %>%
  discparcoord(k=1000, 
               interactive=TRUE, 
               name="Non-matching transactions by cc and loyalty number")

Figure 3: Parallel Coordinate plot of CC to Loyalty card number with discount

There was a subset of cc transactions that are not tagged to any loyalty card transactions. Possible deductions could be that owners forgot their loyalty card when making the transactions or there might be suspicious activities in these transactions where owners deliberately avoided using their loyalty card. This subset of transactions was visualise with a boxplot in Figure 4. The boxplot displayed one extreme outlier at Frydos Autosupply n’ More. Hovering over the red outlier circle indicates that the owner of cc 9551 spent 10,000 dollars in that transaction whereas the median price is 134.9 at Frydos Autosupply n’ More. This transaction was extremely suspicious because of the extreme outlier spending and the owner did not use his/her loyalty card despite being such a high amount transactional value.

## Transactions match equally from cc and loyalty card
match_cc <- left_join((left_join(cc, card_tag)), 
                      loyalty, by=c("location","date","price")) %>% 
  na.omit() %>% 
  group_by(last4ccnum, loyaltynum.y) %>% filter(n()>1) %>%
  dplyr::select(-(timestamp.y)) %>%
  rename(timestamp = timestamp.x, 
         loyaltynum_owner = loyaltynum.x, 
         loyaltynum_trans = loyaltynum.y) %>%
  mutate(trans_match = 1)

## Transactions match with difference in 20 dollars denomination
match_cc_dis <- anti_join(cc, match_cc, by=c("date","location","price")) %>% 
  left_join((non_match_trans %>% filter(diff %in% c(20, 40, 60, 80))), 
            by=c("location", "last4ccnum","date","price"="price.x")) %>% 
  na.omit() %>%
  dplyr::select(-timestamp.x, -datetime.y, -timestamp.y) %>%
  rename(datetime = datetime.x, 
         loyaltynum_trans = loyaltynum, 
         price_loy = price.y) %>%
  mutate(trans_match = 1)

## Transactions with cc transactions but not match to loyalty card
no_loy_trans <- anti_join(cc, match_cc, by=c("date","location","price")) %>%
  anti_join(match_cc_dis, by=c("date","location","price")) %>%
  mutate(trans_match = 0)

## Tagging all information on transactions from cc and loyalty to final_trans
final_trans <- bind_rows(match_cc, match_cc_dis, no_loy_trans)

## Determine median price per location
median_price <- no_loy_trans %>% 
           group_by(location) %>% 
           summarize(med=median(price))
## Data transformation for boxplot plotting
no_loy_trans_1 <- no_loy_trans %>% 
  left_join(median_price, by=c("location"))
## Boxplot function
boxplot1 <- ggplot(no_loy_trans_1, aes(x=location, y=price, text=paste("Median:", med))) +
  geom_boxplot(outlier.color="red",outlier.fill="red") + 
  geom_point(alpha=0) + scale_y_log10() + coord_flip() +
  ggtitle("Boxplot of CC transaction NOT tagged to loyalty card") +
  theme(axis.title=element_blank(),
        plot.title=element_text(size=16, face="bold")) +
  xlab("Price")
boxplot_p1<-ggplotly(boxplot1, width_svg = 7, height_svg = 7)
boxplot_p1$x$data[[1]]$hoverinfo <- "none"
# overrides black outline of outliers
boxplot_p1$x$data[[1]]$marker$line$color = "red"
# overrides black extreme outlier color
boxplot_p1$x$data[[1]]$marker$outliercolor = "red"
# overrides black not as extreme outlier color
boxplot_p1$x$data[[1]]$marker$color = "red"
boxplot_p1

Figure 4: Boxplot of cc transaction without loyalty card

To determine the most popular location in Abila, the visualisation in Figure 5 shows the frequency of the transactions and the transaction prices for each location. The plot Number of transactions per day by location shows which location had the highest number of transaction each day separated by time period and the weekends are shaded in grey. The plot Boxplot of transaction prices per location shows the prices for each location. Log transformation was performed on the boxplot x-axis(Price). The following insights are inferred from the plot.

1. Transactions occurring only on weekdays morning.

The 3 location seems to be coffee shops based on their location name or logo and Brew’ve Been Served is the most popular location among them. Based on the locations, price and timestamp of the transactions, a possible deduction would be these coffee shops serves take-out coffee and are located in between employees home and GAStech. The median price of each transactions were similar for all 3 locations at around 12 dollars. From the map, Coffee Cameleon is the nearest to GAStech but Brew’ve Been Served has more transactions. making Brew’ve Been Served the most popular morning coffee take-out choice among the employees.

2. Transactions occurring only on weekdays afternoon.

Based on the location name or logo, these 4 location seems to be food and beverage outlets. The median price for these locations range from 12 to 15 dollars. A possible deductions could be these location only operates on weekday lunch time and serves drinks such as coffee as they have similar price range as the take-out coffee mentioned previously.

3. Transactions occurring daily during the afternoon or night period.

The 6 locations has transactions from both afternoon and night time period on all days with a median price of 28 to 32 dollars. A possible deduction based on the location names, logo and transaction trend indicates that these are also food and beverage outlets. However, the higher median price and frequent transaction during both afternoon and night period might suggest that these are restaurants that serves full meals for lunch and dinner.

4. Higher value transactions on weekdays only.

These locations has higher median price compared to the others. The company name and logo suggests that the locations are customer or supplier of GAStech. As the bulk of transaction are on the weekday, a possible deduction would be these locations are related to work. The higher median price value could be due to the purchase raw materials which translate to much higher price transacted on weekdays only.

5. Suspicious transaction.

In the boxplot, there is an extreme outlier of a 10,000 dollars while the median price was only 149 dollars. This particular transaction was flagged out in our previous analysis of cc transaction that were not tagged to loyalty card. As individuals are more likely to use loyalty card in conjunction with the loyalty card, the scenario for this transaction further exacerbated the suspicion.

There were frequent transactions performed at Kronos Mart during the midnight period on Monday and both Sundays. The 5 transactions in during midnight is not common and it only occurs only at one specific location. These 5 transactions performed were not tagged to a loyalty card as well. This raises suspicion on the cc owner.

In the boxplot, there was an extreme outlier of 1,239.41 dollars while the median was only 211.47 dollars. It was six times the median price which might be a suspicious transactions. However, looking at the frequency of transactions at Albert’s Fine Clothing, it seems like a common place to buy clothing. Possible deduction was the person was buying lots of clothing for his family or friends, amounting to a much higher price than usual.

## Data manipulation to add more factors
final_trans_1 <- final_trans %>% ungroup() %>%
  mutate(day = as.factor(wday(date)),
         wkday = ifelse(day == "6" | day =="7", "weekend", "weekday"),
         time_bin = case_when(
              hour(datetime)>=0 & hour(datetime)<6 ~ "Midnight",
              hour(datetime)>=6 & hour(datetime)<12 ~ "Morning",
              hour(datetime)>=12 & hour(datetime) <18 ~ "Afternoon",
              hour(datetime)>=18 ~ "Night"),
          time_bin = factor(time_bin, 
                      levels = c("Midnight", "Morning", "Afternoon", "Night"))
        )

## Data transformation to plot Bar graph for transaction frequency
freq<- final_trans_1 %>% 
  group_by(location, date, time_bin) %>% summarize(co=n())
freq_location <- ggplot(freq, aes(x=date, y=co, fill=time_bin, 
  tooltip= paste(co, " transactions at ",location, " on ", date, time_bin))) +
  geom_col_interactive() + 
  annotate(geom="rect", xmin=ymd(20140111)-.5, xmax=ymd(20140113)-.5, 
           ymin=-Inf, ymax=Inf, fill='dark grey' , alpha=0.5) +
  annotate(geom="rect", xmin=ymd(20140118)-.5, xmax=ymd(20140120)-.5, 
           ymin=-Inf, ymax=Inf, fill='dark grey' , alpha=0.5) +
  facet_wrap(~location) +
  ggtitle("Number of transactions per day by location") +
  xlab("Date") + ylab("Number of transactions") +
  labs(fill="Time period") +
  theme(plot.title=element_text(size=20,face="bold"),
        axis.title=element_text(size=14,face="bold"),
        strip.text = element_text(size = 6),
        axis.text=element_text(size=6),
        axis.text.x=element_text(angle=45, hjust=1),
        legend.position="bottom") 

# Find median price per location
median_price_final <- final_trans_1 %>% 
           group_by(location) %>% 
           summarize(med=median(price))
## Data transformation for boxplot plotting
final_trans_1 <- final_trans_1 %>% 
  left_join(median_price_final, by=c("location"))

## Boxplot plotting
boxplot <- ggplot(final_trans_1, aes(x=location, y=price, text=paste("Median:", med))) +
  geom_boxplot(outlier.color="red",outlier.fill="red") + 
  geom_point(alpha=0) + scale_y_log10() + coord_flip() +
  ggtitle("Boxplot of transaction prices per location") +
  theme(axis.title=element_blank(),
        plot.title=element_text(size=20, face="bold"))
boxplot_p<-ggplotly(boxplot)
boxplot_p$x$data[[1]]$hoverinfo <- "none"
# overrides black outline of outliers
boxplot_p$x$data[[1]]$marker$line$color = "red"
# overrides black extreme outlier color
boxplot_p$x$data[[1]]$marker$outliercolor = "red"
# overrides black not as extreme outlier color
boxplot_p$x$data[[1]]$marker$color = "red"

## Plot Interactive Bar chart and Boxplot
girafe(ggobj=freq_location, width_svg = 7, height_svg = 7)

Figure 5: Visualize transactions history

boxplot_p

Figure 5: Visualize transactions history

2. Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find?

The GPS dataset has rows of GPS coordinates that were logged every few seconds. This signifies that the car was moving and logging different GPS coordinates. The data was transformed to only keep the stationary GPS coordinate for each car by determining rows where the time lag between subsequent GPS log by each car id was more than 5 minutes. 5 minutes was selected because the waiting time at a traffic lights is around 3 to 5 minutes hence the upper bound was chosen to eliminate situations where the stationary GPS coordinates were due stoppage at traffic lights.

2.1 The first anomaly to be investigated is the high transaction price of 10,000 dollars performed at Frydos Autosupply n’ More on 13/01/2014 night by cc 9951.

Based on the location name and logo, it is highly likely to be a mechanic repair shop for vehicle. The transaction without a matching loyalty card transaction made it more suspicious. The transaction records for cc 9951 was extracted and observed for 13/01/2014 in table 2.

There were 5 transactions made and 3 of them did not match the loyalty card transaction data. This eliminates the possibility of the owner forgetting to bring his/her loyalty card for that particular day. There were two transactions made with a time difference of 10 minutes and one of them did not use the loyalty card during both afternoon and night time period each. To further analyse the transactions, the gps log data was visualise on Abila map.

## Transactions on 13/01/2014 at "Frydos Autosupply n' More"
knitr::kable(final_trans_1 %>% 
      filter(last4ccnum==9551 & date == dmy(13012014)) %>%
      dplyr::select(datetime,location,price,last4ccnum,trans_match)%>% 
      arrange(datetime), "simple",
      caption="Table of transaction for cc 9951 on 13/01/2014") 
Table 2: Table of transaction for cc 9951 on 13/01/2014
datetime location price last4ccnum trans_match
2014-01-13 06:04:00 Daily Dealz 2.01 9551 0
2014-01-13 13:18:00 U-Pump 55.25 9551 0
2014-01-13 13:28:00 Hippokampos 30.51 9551 1
2014-01-13 19:20:00 Frydos Autosupply n’ More 10000.00 9551 0
2014-01-13 19:30:00 Ouzeri Elian 28.75 9551 1

Figure 6 shows all cars GPS travel lines for 13/01/2014. From Figure 5 frequency plot for each location, we observe that there was only 2 transaction performed at U-Pump throughout the 2 weeks data. Hovering around the GPS lines right on top of U-Pump reveals that only car id 24 visited the location. Since U-Pump is a petrol kiosk, we can confidently say that car id 24 owner used cc 9951 to make a transaction at “U-Pump.”

Car id 24 GPS line was marked in red and the stationary GPS coordinates were marked as blue dots. These blue dots will represent the GPS coordinates where the car was stationary at the particular location.

Hovering over the blue dot near U-Pump on the map shows the car stopping at 12:35:15 and leaving at 13:22:01. This matches the transaction performed at U-Pump at 13:18:00. With a strong possibility that car id 24 uses cc 9951, the heatmap was used to visualise the time period for the day when the car is on the move. The horizontal blue bars in Figure 6 represents the time period when the car is moving.

Thereafter, the car left U-Pump at 13:22:01 and arrived back in GAStech at 13:27:14. Hence, the transaction at 13:28:00 at Hippokampos was not possible.

In the evening, the car GPS showed that it left GAStech at 17:57:01 and stop around Ipsilantou Avenue at 18:00:31 and subsequently drove off at 19:29:01. The 10,000 dollars transaction at Frydos Autosupply n’ More was performed at 19:20:00 which fits the car gps timeline. Although the car did not stop directly at Frydos Autosupply n’ More, the distance is around 500 metres and it is possible for the owner to walk on foot to make the 10,000 dollars transactions.

Thereafter, the car started driving at 19:29:01 to the north and stop at 19:31:35. This eliminates the possibility of the transaction at 19:30:00 at Ouzeri Elian.

The combination of transaction data of cc 9551 records with car id 24 does not fit perfectly. An observation of the two possible transaction made on cc 9551 by car id 24 owner did not have a loyalty card transaction record matched. Similarly, the other two impossible transactions were both matched to a loyalty card transaction. The trend further confirmed that the transactions made on cc 9551 is extremely suspicious. Probable deduction would be cc 9551 does not belong to car id 24 while the real owner of cc 9551 was someone else who used it during the day too.

## Load Map and SHP file
bgmap <- raster("datasets/MC2-tourist.tif")
abila_st <- st_read(dsn="datasets/Geospatial", layer="Abila")
Reading layer `Abila' from data source `C:\limyongkai\distill_blog\_posts\2021-07-10-vastmc2\datasets\Geospatial' using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84
## Transform the structure of GPS data
gps <- gps %>% mutate(timestamp=mdy_hms(Timestamp),id=as_factor(id))
gps1 <- st_as_sf(gps, coords=c("long","lat"), crs=4326)
gps1 <- gps1 %>% group_by(id) %>% arrange(timestamp) %>%
  mutate(start_diff= as.numeric(timestamp - lag(timestamp,default=first(timestamp)))/60,
         stop_diff= as.numeric(lead(timestamp)-timestamp)/60,
         date = as.Date(timestamp)) %>%
  rename(gps.coord=geometry) 

## Convert coordinates to geometry, filter date and convert to LINE string
gps_sf <- st_as_sf(gps, coords=c("long","lat"), crs=4326) 
gps_sf1 <- gps_sf %>% filter(as.Date(gps_sf$timestamp) == dmy(13012014))
gps_path1 <- gps_sf1 %>% group_by(id) %>% 
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_sf_24 <- gps_sf1 %>% filter(as.Date(gps_sf1$timestamp) == dmy(13012014), id==24)
gps_path_24 <- gps_sf_24 %>% group_by(id) %>% 
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_24_points <- gps1 %>% filter(id ==24 & date == dmy(13012014)) %>% 
  filter(start_diff>5 | stop_diff >5) %>% 
  mutate(start_vec=ifelse(start_diff>5,1,0), stop_vec=ifelse(stop_diff>5,1,0))

gps_pts <- gps1 %>% filter(start_diff >5 | stop_diff >5)
gps_pts <- gps_pts %>% group_by(id) %>% 
  mutate(start_vec=ifelse(start_diff>5,1,0), 
         stop_vec=ifelse(stop_diff>5,1,0)) %>%
  filter(!(start_vec==1 & stop_vec==1)) %>% 
  mutate( start.time = ifelse(start_vec==1, timestamp,NA),
          end.time=ifelse(stop_vec==1, timestamp, NA),
          start.gps = ifelse(start_vec==1, gps.coord,NA), 
          end.gps=ifelse(stop_vec==1, gps.coord,NA),
          end.time = ifelse(start_vec==1, lead(end.time), end.time),
          end.gps = ifelse(start_vec==1, lead(end.gps), end.gps)) %>% 
  filter(!is.na(start.time))%>% 
  mutate(start.time= as_datetime(start.time), 
         end.time=as_datetime(end.time)) %>% 
  dplyr::select(id, date, start.time, end.time, start.gps, end.gps) %>%
  mutate(hr=hours(start.time),
         time.diff=round(difftime(end.time,start.time,units='mins'),2),
         dummy=1) 
gps24 <- gps_pts %>% filter(id==24&date==dmy(13012014))
hm24<-ggplot(gps24, aes(x=start.time, y=id, 
    tooltip=paste("Car start time:",start.time,
                  "\nCar stop time:",end.time,
                  "\nDriving time (mins):",time.diff))) +
  geom_tile_interactive(aes(fill=dummy)) +
  xlab("Time") + ylab("Car id")+theme(legend.position="none")


## Plot interactive map
tmap_mode("view")
map1<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_path1)+
  tm_lines() +
  tm_shape(gps_path_24) +
  tm_lines(col ="red") +
  tm_shape(gps_24_points)+
  tm_dots(col="blue", shape=30)
tmap_leaflet(map1)

Figure 6: GPS data for 13/01/2014

girafe(ggobj=hm24)

Figure 6: GPS data for 13/01/2014

2.2 The second anomaly were the early morning transactions records at Kronos Mart from Figure 5 frequency plot. Table 3 below displays all transactions records at Kronos Mart. Five out of the ten transactions were performed in the wee hours around 3am on three different days and three out of the five occurred on 19/01/2014. These few transactions were particularly unusual and further investigation was conducted.

## Transactions on 13/01/2014 at "Frydos Autosupply n' More"
knitr::kable(final_trans_1 %>% 
      filter(location == "Kronos Mart") %>%
      dplyr::select(datetime,location,price,last4ccnum,trans_match)%>% 
      arrange(datetime), "simple",
      caption="Table of transaction for cc 9951 on 13/01/2014") 
Table 3: Table of transaction for cc 9951 on 13/01/2014
datetime location price last4ccnum trans_match
2014-01-10 09:30:00 Kronos Mart 203.91 7688 0
2014-01-12 03:39:00 Kronos Mart 277.26 8156 0
2014-01-13 03:00:00 Kronos Mart 147.30 5407 0
2014-01-13 08:01:00 Kronos Mart 159.06 6816 0
2014-01-14 08:20:00 Kronos Mart 58.85 6899 0
2014-01-16 07:30:00 Kronos Mart 298.83 7108 0
2014-01-17 08:08:00 Kronos Mart 286.24 1415 0
2014-01-19 03:13:00 Kronos Mart 87.66 3484 0
2014-01-19 03:45:00 Kronos Mart 194.51 9551 0
2014-01-19 03:48:00 Kronos Mart 150.36 8332 0

The GPS records for 19/01/2014 were visualised to investigate the transactions. From Figure 7, there was no GPS data that passed by nor stop in the vicinity of Kronos Mart on 19/01/2014. The closest stop location was at ROBERTS AND SONS at 13:20:06 to 14:23:01 by car id 30 represented by the blue dot. The timing of the transaction does not coincide with the cc transaction timing.

Hence, possible deduction could be that cc owners of 3484, 9551 and 8332 stays within walking distance to Kronos Mart, therefore eliminating the need to drive their employee car to the location. Another possibility is that the owners of the cc used their own personal vehicles to get there, resulting in no GPS record for employees issued vehicles. Coincidentally, cc 9551 also appeared in these transaction, which warrants additional investigation.

## Map geometry for 19012014
gps_sf2 <- gps_sf %>% filter(as.Date(gps_sf$timestamp) == dmy(19012014))
gps_path2 <- gps_sf2 %>% group_by(id) %>% 
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_path2 <- gps_path2 %>% filter(id !=29)
gps_points2 <- gps1 %>% filter(date == dmy(19012014)) %>% 
  filter(start_diff>5 | stop_diff >5) %>% 
  mutate(start_vec=ifelse(start_diff>5,1,0), stop_vec=ifelse(stop_diff>5,1,0))

## Plot interactive map
tmap_mode("view")
map2<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_path2)+
  tm_lines() +
  tm_shape(gps_points2)+
  tm_dots(col="blue", shape=30)
tmap_leaflet(map2)

Figure 7: GPS data for 19/01/2014

2.3 Lastly, we will cross-check and validate the GPS data with the frequency of transactions at each location. We will first validate the weekday movement. From the earlier section, there were three groups of transaction data: weekday morning transaction only, weekday afternoon transactions only and high value transactions on weekdays only. The map with GPS movement on 07/01/2014 was visualise in Figure 8 as there were transactions performed on that day at all of the locations.

The car GPS stationary coordinates in blue dots for Coffee Cameleon and Hallowed Grounds fits the transaction data. However, the blue dots directly on Brew’ve Been Served logo in the map shows that the timing of the stationary coordinates were mainly in the afternoon or evening. This does not match the transaction timing at Brew’ve Been Served. However, looking slightly south near the main road of Ipsilantou Avenue, there were multiple GPS stationary coordinates in the morning and they fit the transaction timing at Brew’ve Been Served. This might be due to the misrepresentation of the location logo on the map.

There are 4 locations that are in this group. Based on the 4 locations name and logo, they seems to be similar to the earlier group consisting of coffee shops.

Table 4 shows the 13 transactions at the 4 locations on 07/01/2014. A common trend observed was the exact same timestamp of 12:00 on all 13 transactions. However, looking at the GPS stationary positions at those location, the GPS stationary coordinates timestamp were in the morning before 09:00 where employees would presumably visit before heading to GAStech for work.

The 4 locations were spread around Abila, and the occurrence of mismatch GPS stationary timestamp were consistent. A possible deduction could be due to faulty Point of Sales (POS) machines at those locations. Alternatively, it might be possible that they are using the same type of POS machine that performed batch processing instead of real-time processing for cc transactions which process at 12:00 daily.

## Transactions on 13/01/2014 at "Frydos Autosupply n' More"
knitr::kable(final_trans_1 %>% 
      filter((location == "Jack's Magical Beans" |
             location == "Brewed Awakenings" |
             location == "Coffee Shack" |
             location == "Bean There Done That") &
             date == dmy(07012014)) %>%
      dplyr::select(datetime,location,price,last4ccnum,trans_match, price_loy)%>% 
      arrange(datetime), "simple",
      caption="Table of transaction the 4 locations on 07/01/2014") 
Table 4: Table of transaction the 4 locations on 07/01/2014
datetime location price last4ccnum trans_match price_loy
2014-01-07 12:00:00 Coffee Shack 16.63 7117 1 NA
2014-01-07 12:00:00 Brewed Awakenings 6.72 8332 1 NA
2014-01-07 12:00:00 Bean There Done That 8.03 1321 1 NA
2014-01-07 12:00:00 Jack’s Magical Beans 18.77 9241 1 NA
2014-01-07 12:00:00 Jack’s Magical Beans 19.61 8156 1 NA
2014-01-07 12:00:00 Bean There Done That 51.25 1415 1 11.25
2014-01-07 12:00:00 Jack’s Magical Beans 23.68 6899 1 3.68
2014-01-07 12:00:00 Brewed Awakenings 64.84 3853 1 4.84
2014-01-07 12:00:00 Brewed Awakenings 71.59 2540 1 11.59
2014-01-07 12:00:00 Bean There Done That 53.89 1877 1 13.89
2014-01-07 12:00:00 Bean There Done That 46.25 6895 1 6.25
2014-01-07 12:00:00 Jack’s Magical Beans 69.84 2463 1 9.84
2014-01-07 12:00:00 Brewed Awakenings 12.17 7688 0 NA

Based on the 7 locations name and logo, they are likely to be industrial Places of Interest. Observation from the stationary GPS represented by the blue dots at these locations revealed that only truck drivers with car id 100 and above visited these locations. The stationary GPS timestamp also matches the cc transaction timestamp. Hence, a possible deduction is that these 7 locations are businesses that are close partners with GAStech and the payment were made by the lorry truck driver during the weekdays. This will align with the fact that lorry driver vehicles only operates on weekday working hours.

## Map geometry for 07012014
gps_sf3 <- gps_sf %>% filter(as.Date(gps_sf$timestamp) == dmy(07012014))
gps_path3 <- gps_sf3 %>% group_by(id) %>% 
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_points3 <- gps1 %>% filter(date == dmy(07012014)) %>% 
  filter(start_diff>5 | stop_diff >5) %>% 
  mutate(start_vec=ifelse(start_diff>5,1,0), stop_vec=ifelse(stop_diff>5,1,0))

## Plot interactive map
tmap_mode("view")
map3<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_path3)+
  tm_lines() +
  tm_shape(gps_points3)+
  tm_dots(col="blue", shape=30)
tmap_leaflet(map3)

Figure 8: GPS data for 07/01/2014

3. Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data? Please limit your answer to 8 images and 500 words.

In order to tag the owners of each credit card and loyalty card to the car id, we would need to combine several factors together to triangulate the results. The two conditions that will be used to triangulate the data between the three datasets are:

  1. CC transaction timestamp has to be between the time period where the car stopped moving which signify the car reaching its location and the subsequent timestamp that the car starts moving, signifying when the car left the location. By using the stationary GPS stop timestamp and the subsequent GPS timestamp, we can narrow down the selection.
  2. Car stationary GPS coordinates has to be within reasonable radius of the location coordinates.

The locations coordinates would be assigned by referencing the tourist map of Abila. However, from the earlier section, we discovered that the tourist map provided might not be accurate in locating the location coordinates as the icons on the tourist map might not represent the exact coordinates of the location.

Furthermore, the tourist map do not have all the locations marked by its logo which will not allows a full join with the locations in the cc transaction data. Table 5 shows the locations from the cc dataset whose logo could not be located visually on the tourist map of Abila. Ranking the number of transaction at each location in descending order, there are high volume of transactions at those locations and the need to map their GPS coordinate is necessary.

## Transactions on 13/01/2014 at "Frydos Autosupply n' More"
locations <- data.frame(location = cc$location) %>% 
  group_by(location) %>% summarize(number_transactions=n())
knitr::kable(locations %>% 
      dplyr::filter(location == "Abila Zacharo" |
                    location == "Brewed Awakenings" |
                    location == "Daily Dealz" |
                    location == "Hippokampos" |
                    location == "Kalami Kafenion" |
                    location == "Kronos Pipe and Irrigation" |
                    location == "Octavio's Office Supplies" |
                    location == "Shoppers' Delight" |
                    location == "Stewart and Sons Fabrication") %>%
      arrange(desc(number_transactions)), "simple",
      caption="Table of location with no traceable coordinates") 
Table 5: Table of location with no traceable coordinates
location number_transactions
Hippokampos 171
Abila Zacharo 72
Kalami Kafenion 64
Brewed Awakenings 30
Shoppers’ Delight 20
Stewart and Sons Fabrication 18
Kronos Pipe and Irrigation 6
Octavio’s Office Supplies 4
Daily Dealz 1

Figure 9 shows the map marked with blue dots representing the stationary GPS coordinate of all the cars except for each employee house. The popular locations can be determined by the frequency of the blue dots at a particular location on the map.

Cross referencing with the transactions table, the locations coordinates were tag with their corresponding coordinates by cross-referencing to the car GPS data and geo-referenced data.

## Getting coordinates of car stop positions
first_gps <- gps1 %>% group_by(id) %>% filter(row_number()==1) %>%
  mutate(start_vec=1, stop_vec=0)  %>% ungroup(id)
gps_pts <- gps1 %>% ungroup(id) %>%
  filter(start_diff >5 | stop_diff >5) %>%
  mutate(start_vec=ifelse(start_diff>5,1,0),
         stop_vec=ifelse(stop_diff>5,1,0)) %>%
  add_row(first_gps) %>% group_by(id) %>% arrange(timestamp) %>%
  filter(!(start_vec==1 & stop_vec==1)) %>%
  group_by(id) %>% arrange(timestamp) %>%
  mutate( start.time = ifelse(start_vec== 0 & stop_vec==0, timestamp, NA),
          start.time = ifelse(start_vec==1, timestamp,NA),
          end.time=ifelse(stop_vec==1, timestamp, NA),
          start.gps = ifelse(start_vec==0 & stop_vec==0, gps.coord,NA),
          start.gps = ifelse(start_vec==1, gps.coord,NA),
          end.gps=ifelse(stop_vec==1, gps.coord,NA),
          end.time = ifelse(start_vec==1, lead(end.time), end.time),
          end.gps = ifelse(start_vec==1, lead(end.gps), end.gps)) %>%
  filter(!is.na(start.time))%>%
  mutate(end.gps = ifelse(end.gps=='NULL',start.gps,end.gps),
         end.time = ifelse(is.na(end.time),start.time, end.time),
         start.time= as_datetime(start.time),
         end.time=as_datetime(end.time),
         next.start.time=lead(start.time)) %>%
  dplyr::select(id, date, start.time,
                end.time, start.gps, end.gps, next.start.time) %>%
  mutate(hr=hours(start.time),
         driving.time=round(difftime(end.time,start.time,units='mins'),2),
         dummy=1) %>%
  mutate(start.gps=purrr::map(start.gps, st_point) %>% st_as_sfc(crs=4326))%>%
  mutate(end.gps=purrr::map(end.gps, st_point) %>% st_as_sfc(crs=4326))
car$CarID <- as_factor(car$CarID)
gps_pts <- left_join(gps_pts, car, by=c("id"="CarID"))
gps_stop_points1 <- gps_pts %>%
  mutate(time.stop = difftime(next.start.time, end.time), 
         time.stop = as.numeric(time.stop))%>% 
  filter(time.stop < 300) %>% 
  dplyr::select(id, start.time, start.gps)

## Generate map with the stop positions in blue dots
tmap_mode("view")
map_POI<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_stop_points1)+
  tm_dots(col="blue", shape=30,id="id",
          popup.vars=c("Car ID"="id", 
                       "Stationary timestamp" = "start.time", 
                       "GPS:"="start.gps"))
tmap_leaflet(map_POI)

Figure 9: GPS stationary locations

The car id are triangulated by tabulating the centroid coordinates of the GPS data from the stationary GPS stop locations from the map. However, there are few limitations by using the methodology mentioned earlier for tagging the owners.

  1. In the earlier section, 4 coffee shops were discovered whose cc transactions timestamp were all at 12:00 but the actual visit time by the employees were in the morning. The inaccuracy of the cc transactions timestamp made it impossible to tag them to the car GPS data.
  2. In the earlier section, the distance between car id 24 GPS stationary coordinates at Frydos Autosupply n’ More was 500 metres away. Locations might not have their dedicated carpark right next to them and some car owners are able to get on foot after parking at a nearby carpark. Hence, the maximum distance of the car stop position to the location coordinates will be set at less than 500 metres, a reasonable distance for traveling on foot.
  3. Employees might not drive their issued car out when they perform the transaction using their cc. Examples could be car pooling for a meal or using their personal vehicles when making the transactions. This will result in a incomplete tagging of the car id GPS to the transaction data.

The interactive heatmap in Figure 10 shows the percentage that were successfully match with the car GPS and cc transaction data by the conditions mentioned earlier. The histogram was also plotted to visualise the distribution of the result. From the two visualisation, we observed that the methodology yield some high percentage match for the car id owner with the cc owner.

# Tagging location coordinates
location_tag <- data.frame(location = c(locations$location,"GAStech"),
 long =c(centroid(rbind(c(24.82590612, 36.05102229),c(24.82591819, 36.05092013),c(24.82598413, 36.05097547)))[1],
         centroid(rbind(c(24.84592966, 36.07443715),c(24.84598782, 36.07434876),c(24.84595026, 36.07437836)))[1],
         centroid(rbind(c(24.85097804, 36.06349268),c(24.85099445, 36.06342076),c(24.85103178, 36.06348173)))[1],
         centroid(rbind(c(24.87617634, 36.07713037),c(24.87621582, 36.07713598),c(24.87619872, 36.07715385)))[1],
         centroid(rbind(c(24.85626503, 36.07529323),c(24.85631411, 36.07523202),c(24.85634841, 36.07528136)))[1],
         centroid(rbind(c(24.85089145, 36.08172086),c(24.85096025, 36.08176242),c(24.85087799, 36.08180554)))[1],
         centroid(rbind(c(24.90119998, 36.05402165),c(24.90128202, 36.05408823),c(24.90116585, 36.05411015)))[1],
         NA,
         centroid(rbind(c(24.88089399, 36.05851786),c(24.88092086, 36.05858619),c(24.8808655, 36.05856303)))[1],
         centroid(rbind(c(24.8951996, 36.07073983),c(24.89517891, 36.07062423),c(24.89526281, 36.07069274)))[1],                         
         centroid(rbind(c(24.88983886, 36.05469486),c(24.88978433, 36.05463184),c(24.88977321, 36.05467589)))[1],
         centroid(rbind(c(24.86416839, 36.07332041),c(24.86417651, 36.07336116),c(24.86419582, 36.07332868)))[1],
         NA,
         centroid(rbind(c(24.86068835, 36.08962196),c(24.86068191, 36.08954231),c(24.8607611, 36.08960361)))[1],
         centroid(rbind(c(24.84132949, 36.07213193),c(24.84134818, 36.07212045),c(24.4134819, 36.07212044)))[1],
         centroid(rbind(c(24.905573, 36.06044638),c(24.90561679, 36.06033304),c(24.90568587, 36.06040053)))[1],
         centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[1],
         centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[1],
         centroid(rbind(c(24.90096913, 36.05842562),c(24.90107066, 36.05844726),c(24.90097455, 36.05850897)))[1],
         centroid(rbind(c(24.88586605, 36.063639),c(24.88595361, 36.06364584),c(24.88586737, 36.06371539)))[1],
         centroid(rbind(c(24.85756422, 36.07660977),c(24.85763811, 36.07664766),c(24.857573, 36.07669909)))[1],
         centroid(rbind(c(24.87330651, 36.06751231),c(24.87335583, 36.06750587),c(24.87333867, 36.06755141)))[1],                    
         centroid(rbind(c(24.85237319, 36.06582037),c(24.85241027, 36.06582475),c(24.85237372, 36.06584816)))[1],
         centroid(rbind(c(24.89986767, 36.05442391),c(24.89996154, 36.05448329),c(24.89987365, 36.05453273)))[1],
         centroid(rbind(c(24.84983351, 36.06587998),c(24.84983936, 36.06582196),c(24.8497762, 36.06583535)))[1],
         NA,
         centroid(rbind(c(24.88551872, 36.05840982),c(24.88542068, 36.0584603),  c(24.88553455, 36.05844325)))[1],
         centroid(rbind(c(24.83307421, 36.0653098),c(24.83314028, 36.06523446),  c(24.84143955, 36.06403449),c(24.84141463, 36.06410072)))[1],
         NA,
         centroid(rbind(c(24.87077341, 36.05196196),c(24.87081903, 36.05192066),c(24.87083665, 36.05197804)))[1],
         centroid(rbind(c(24.85227441, 36.06324941),c(24.85226894, 36.06330479),c(24.8523291, 36.0632684)))[1],
         NA,NA,
         centroid(rbind(c(24.87148791, 36.06774029),c(24.8714995, 36.06774623),c(24.87149104, 36.06776587)))[1],
         centroid(rbind(c(24.87956897, 36.04802112),c(24.8795714, 36.04804908),  c(24.8795745, 36.0480309)))[1]),
 lat = c(centroid(rbind(c(24.82590612, 36.05102229),c(24.82591819, 36.05092013),c(24.82598413, 36.05097547)))[2],
         centroid(rbind(c(24.84592966, 36.07443715),c(24.84598782, 36.07434876),c(24.84595026, 36.07437836)))[2],
         centroid(rbind(c(24.85097804, 36.06349268),c(24.85099445, 36.06342076),c(24.85103178, 36.06348173)))[2],
         centroid(rbind(c(24.87617634, 36.07713037),c(24.87621582, 36.07713598),c(24.87619872, 36.07715385)))[2],
         centroid(rbind(c(24.85626503, 36.07529323),c(24.85631411, 36.07523202),c(24.85634841, 36.07528136)))[2],
         centroid(rbind(c(24.85089145, 36.08172086),c(24.85096025, 36.08176242),c(24.85087799, 36.08180554)))[2],
         centroid(rbind(c(24.90119998, 36.05402165),c(24.90128202, 36.05408823),c(24.90116585, 36.05411015)))[2],
         NA,
         centroid(rbind(c(24.88089399, 36.05851786),c(24.88092086, 36.05858619),c(24.8808655, 36.05856303)))[2],
         centroid(rbind(c(24.8951996, 36.07073983),c(24.89517891, 36.07062423),c(24.89526281, 36.07069274)))[2],
         centroid(rbind(c(24.88983886, 36.05469486),c(24.88978433, 36.05463184),c(24.88977321, 36.05467589)))[2],
         centroid(rbind(c(24.86416839, 36.07332041),c(24.86417651, 36.07336116),c(24.86419582, 36.07332868)))[2],
         NA,
         centroid(rbind(c(24.86068835, 36.08962196),c(24.86068191, 36.08954231),c(24.8607611, 36.08960361)))[2],
         centroid(rbind(c(24.84132949, 36.07213193),c(24.84134818, 36.07212045),c(24.4134819, 36.07212044)))[2],
         centroid(rbind(c(24.905573, 36.06044638),c(24.90561679, 36.06033304),c(24.90568587, 36.06040053)))[2],
         centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[2],
         centroid(rbind(c(24.85804364, 36.05970763),c(24.8580772, 36.05975308),c(24.8579808, 36.05976284)))[2],
         centroid(rbind(c(24.90096913, 36.05842562),c(24.90107066, 36.05844726),c(24.90097455, 36.05850897)))[2],
         centroid(rbind(c(24.88586605, 36.063639),c(24.88595361, 36.06364584),c(24.88586737, 36.06371539)))[2],
         centroid(rbind(c(24.85756422, 36.07660977),c(24.85763811, 36.07664766),c(24.857573, 36.07669909)))[2],
         centroid(rbind(c(24.87330651, 36.06751231),c(24.87335583, 36.06750587),c(24.87333867, 36.06755141)))[2],
         centroid(rbind(c(24.85237319, 36.06582037),  c(24.85241027, 36.06582475),c(24.85237372, 36.06584816)))[2],
         centroid(rbind(c(24.89986767, 36.05442391),c(24.89996154, 36.05448329),  c(24.89987365, 36.05453273)))[2],
         centroid(rbind(c(24.84983351, 36.06587998),c(24.84983936, 36.06582196),c(24.8497762, 36.06583535)))[2],
          NA,
         centroid(rbind(c(24.83307421, 36.0653098),c(24.83314028, 36.06523446),  c(24.84143955, 36.06403449),c(24.84141463, 36.06410072)))[1],
         centroid(rbind(c(24.88551872, 36.05840982),c(24.88542068, 36.0584603),  c(24.88553455, 36.05844325)))[2],
          NA,
         centroid(rbind(c(24.87077341, 36.05196196),c(24.87081903, 36.05192066),c(24.87083665, 36.05197804)))[2],
         centroid(rbind(c(24.85227441, 36.06324941),c(24.85226894, 36.06330479),c(24.8523291, 36.0632684)))[2],
         NA,NA,
         centroid(rbind(c(24.87148791, 36.06774029),c(24.8714995, 36.06774623),c(24.87149104, 36.06776587)))[2],
         centroid(rbind(c(24.87956897, 36.04802112),c(24.8795714, 36.04804908),  c(24.8795745, 36.0480309)))[2]))
location_tag <- location_tag %>% na.omit()
location_tag <- st_as_sf(location_tag, coords=c("long","lat"), crs=4326)

## join GPS data with transaction data with location coordinates
final_trans_gps <- inner_join(final_trans_1, location_tag, by=c("location")) %>%
  rename(loc.coord=geometry)
## Join with car gps
gps_match <- left_join(final_trans_gps, gps_pts, by=c("date"))
## Tag the location to car gps
gps_match1 <- gps_match %>% group_by(last4ccnum) %>% arrange(datetime) %>%
  filter(datetime > end.time & datetime <= next.start.time + minutes(30)) %>%
  mutate(diff.dist = st_distance(loc.coord, end.gps, by_element=TRUE),
         diff.dist = as.numeric(diff.dist)) %>%
  filter(diff.dist <500)
tagging <-gps_match1 %>%group_by(last4ccnum, id)%>%
  summarize(tag=n()) %>% arrange(desc(tag))
## Get total count of transactions minus the 4 locations per cc num
trans_collapse <- cc %>%
  filter(!(location %in% c("Bean There Done That",
                           "Brewed Awakenings",
                           "Coffee Shack",
                           "Jack's Magical Beans"))) %>%
  group_by(last4ccnum) %>% summarize(total=n())
## Limit to top 3 match only by percentage
tagging_cc_gps <- left_join(tagging, trans_collapse, by=c("last4ccnum")) %>%
  mutate(percent=round(tag/total*100,2))

tag_plot<-ggplot(tagging_cc_gps, aes(x=id, y=last4ccnum,fill=percent))+
  geom_tile() + scale_fill_gradient(low="sienna1", high="navyblue") +
  xlab("Car ID") +ylab("CC last 4 number")+ 
  labs(fill="% match")
histogram<-ggplot(tagging_cc_gps,aes(percent))+geom_histogram(binwidth=5)+
  stat_function(fun=dnorm,aes(color="red"),
                args=list(mean=mean(tagging_cc_gps$percent),
                sd=sd(tagging_cc_gps$percent)))
ggplotly(tag_plot) %>% layout(hoverlabel=list(bgcolor="white"))

Figure 10: Car GPS tagging to CC number

ggplotly(histogram) %>% layout(hoverlabel=list(bgcolor="white"))

Figure 10: Car GPS tagging to CC number

Hence, we can confidently infer that matches over 75% will be accurate. However, as there are more cc owners (55 unique owners) than car owners (35 unique car id) and the truck drivers share vehicles (5 unique truck id), we will drop the truck drivers with car id of 100 and above. Observation of the heatmap in figure 10 reveals that car id 23, car id 29 and car id 30 has matches of more than one cc number and car id 28 does not have a match with more than 75%.

From Table 6, we observe that car id 23 matches to three unique cc number with matches over 75%. The highest percentage match to cc 3484 at 91.43% shows high probability for inference, hence the observation that matches to cc 8202 and 8411 will be dropped.

For car id 29 and 30, the matches to cc number percentage are relatively high and defers less than 10%. Further investigation on the GPS map location will be performed to verify which match to retain.

## Get the match of car id to cc last4ccnum
tagging <- tagging_cc_gps %>% mutate(id=as.character(id), id=as.numeric(id)) %>% 
  filter(percent>=75 & id<100)
knitr::kable(tagging %>% filter(id==23 | id==29 | id==30) %>% 
               arrange(id), "simple",
      caption="Table of employees record and their cc and loyalty number")
Table 6: Table of employees record and their cc and loyalty number
last4ccnum id tag total percent
3484 23 32 35 91.43
8202 23 26 33 78.79
8411 23 26 32 81.25
3547 29 18 20 90.00
5921 29 14 14 100.00
6901 30 32 37 86.49
8202 30 26 33 78.79
final_tagging <- tagging %>% 
  filter(!(last4ccnum==8202 & id==23), !(last4ccnum==8411 & id ==23))

Investigation of car id 28 low cc transactions matches was visualised in Figure 11 and it revealed that the GPS coordinates of car id 28 has lots of noise. The noise in the GPS line caused a wider spread of GPS line in the visualisation on the map and also zig-zag incoherent GPS path. This most probably signifies a faulty GPS signal on the car.

Secondly, we observe that the stop position was not accurate. For example, the frequency of GPS stop coordinates at the extreme south of the map should be at GAStech. Hence, the GPS stop coordinates seems to deviate in the North-West direction. The most probable explanation will be a faulty GPS system since the GPS points were noisy and were not correctly geo-referenced on the map.

## Map geometry for original car id 28 data
gps_sf5 <- gps_sf %>% filter(id==28)
gps_path5 <- gps_sf5 %>% group_by(id) %>% 
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_28_points <- gps1 %>% filter(id ==28) %>% 
  filter(start_diff>5 | stop_diff >5) %>% 
  mutate(start_vec=ifelse(start_diff>5,1,0), stop_vec=ifelse(stop_diff>5,1,0))

## Plot interactive map
tmap_mode("view")
map5<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_path5)+
  tm_lines() +
  tm_shape(gps_28_points)+
  tm_dots(col="blue")
tmap_leaflet(map5)

Figure 11: Original GPS for car id 28

After re-calibrating the GPS coordinates for car id 28, Figure 12 shows the GPS movement data for car id 28. With the re-calibrated GPS data, we would match it with the cc transaction data to infer which cc belongs to car id 28.

From the map in Figure 12, the unqiue observation was that car id 28 visited Ahaggo Museum on the 18th and 19th of Jan and frequently patronise Jack’s Magical Beans and Ouzeri Elian over the two weeks.

From the cc transaction table, a search of Ahaggo Museum revealed that cc 1286, 7384 and 9241 made transactions on the 18th and 19th of Jan. Next, a search of Jack’s Magical Beans shows that only cc 9241 out of the three cc made transactions at the location. Lastly, a search of Ouzeri Elian on the datatable reveals that cc 9241 made 6 transactions at the location. Hence, we are confident to infer that car id 28 is the owner of cc 9241.

## Map geometry for re-calibrated Car id 28
gps28 <- gps %>% filter(id==28) %>% 
  mutate(long = long +0.005,
         lat=lat-0.002)
gps_sf28 <- st_as_sf(gps28, coords=c("long","lat"), crs=4326)
gps_path28 <- gps_sf28 %>% group_by(id) %>% 
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps28_pt <- st_as_sf(gps28, coords=c("long","lat"), crs=4326)
gps28_pt <- gps28_pt %>% 
  group_by(id) %>% arrange(timestamp) %>%
  mutate(start_diff= as.numeric(timestamp - lag(timestamp,default=first(timestamp)))/60,
         stop_diff= as.numeric(lead(timestamp)-timestamp)/60,
         date = as.Date(timestamp)) %>%
  rename(gps.coord=geometry) %>% 
  filter(start_diff>5 | stop_diff >5) %>% 
  mutate(start_vec=ifelse(start_diff>5,1,0), stop_vec=ifelse(stop_diff>5,1,0))

## Plot interactive map
tmap_mode("view")
map6<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_path28)+
  tm_lines() +
  tm_shape(gps28_pt)+
  tm_dots(col="blue")
tmap_leaflet(map6)

Figure 12: Re-calibrated GPS for car id 28

final_tagging <- final_tagging %>% 
  dplyr::select(last4ccnum, id) %>% 
  mutate(last4ccnum = as.character(last4ccnum),
         id = as.character(id)) %>% 
  bind_rows(c(last4ccnum="9241", id="28"))

Next, we will focus on car id 29 where it matches 90% of cc 3547 transactions and 100% of cc 5921. The high proportion of matches on both credit card warrants some investigation into the data.

Looking at table 7 for both cc number, we observe that cc 3547 has transactions between 12/01/2014 to 19/01/2014 and cc 5921 has transactions between 06/01/2014 to 10/01/2014. Cross-referencing the GPS data for car id 29 in Figure 13, we can observe that the cc transactions matches the GPS data of car id 29. A possible deduction is that the owner of car id 29 used both cc card as there was no overlap in the transaction dates for both cc. Possible scenario could be that the owner switch the CC from 5921 to 3547 after 10/01/2014. However, there might be missing data on 11/01/2014 where it was not captured on both cc. Hence, we will tag car id 29 to both cc 5921 and 3547.

gps_sf29 <- gps_sf %>% filter(id==29)
gps_path29 <- gps_sf29 %>% group_by(id) %>% 
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps29_pt <- gps_stop_points1 %>% filter(id==29)
tmap_mode("view")
map7<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_path29)+
  tm_lines() +
  tm_shape(gps29_pt)+
  tm_dots(col="blue")
cc3547 <- cc %>% filter(last4ccnum==3547) %>% dplyr::select(-datetime, -date)
cc5921 <- cc %>% filter(last4ccnum==5921) %>% dplyr::select(-datetime, -date)
knitr::kable(list(cc3547,cc5921),caption="Transactions for CC 3547 & 5921")
Table 7: Transactions for CC 3547 & 5921
timestamp location price last4ccnum
1/12/2014 16:08 Shoppers’ Delight 51.50 3547
1/12/2014 20:11 Katerina’s Cafe 67.14 3547
01/13/2014 07:40 Coffee Cameleon 19.93 3547
01/13/2014 13:52 Katerina’s Cafe 29.55 3547
01/13/2014 19:50 Katerina’s Cafe 89.83 3547
01/14/2014 07:37 Coffee Cameleon 9.80 3547
01/14/2014 13:41 Katerina’s Cafe 75.46 3547
01/14/2014 20:17 Katerina’s Cafe 36.95 3547
01/15/2014 07:51 Coffee Cameleon 14.47 3547
01/15/2014 13:58 Abila Zacharo 33.80 3547
01/15/2014 21:21 Katerina’s Cafe 27.48 3547
01/16/2014 07:38 Coffee Cameleon 67.19 3547
01/16/2014 13:27 Abila Zacharo 31.31 3547
01/16/2014 19:47 Katerina’s Cafe 34.34 3547
01/17/2014 07:38 Coffee Cameleon 10.27 3547
01/17/2014 13:42 Katerina’s Cafe 21.01 3547
01/18/2014 13:34 Ouzeri Elian 25.75 3547
01/18/2014 15:31 General Grocer 477.60 3547
01/18/2014 19:53 Katerina’s Cafe 76.10 3547
01/19/2014 18:54 Katerina’s Cafe 72.25 3547
timestamp location price last4ccnum
1/6/2014 7:49 Coffee Cameleon 8.39 5921
1/6/2014 13:48 Ouzeri Elian 30.87 5921
1/6/2014 20:33 Katerina’s Cafe 15.52 5921
1/7/2014 7:46 Coffee Cameleon 9.10 5921
1/7/2014 13:54 Gelatogalore 88.97 5921
1/7/2014 20:32 Katerina’s Cafe 19.53 5921
1/8/2014 7:52 Coffee Cameleon 12.26 5921
1/8/2014 13:29 Kalami Kafenion 24.08 5921
1/8/2014 20:42 Katerina’s Cafe 92.83 5921
1/9/2014 7:37 Coffee Cameleon 19.99 5921
1/9/2014 14:13 Guy’s Gyros 17.44 5921
1/9/2014 19:30 Katerina’s Cafe 26.60 5921
1/10/2014 7:47 Coffee Cameleon 11.54 5921
1/10/2014 19:56 Katerina’s Cafe 21.89 5921
tmap_leaflet(map7)

Figure 13: GPS for car id 29

Lastly, we will look at car id 30 with cc 6901 and 8202. The GPS data for car id 30 was visualise in Figure 14 and the transaction from cc 6901 and 8202 in table 8.

Comparing the GPS data map and cc translation data, we focused on locations with a lower frequency of visit and locations in a less congested area for easier verification. From the 3 locations and transaction details below, we can deduce that cc 6901 matches car id 30.

  1. GPS data showed a visit to Ouzeri Elian on 07/01/2014 and only cc 6901 has matching transaction.
  2. GPS data showed visits to Kalami Kafenion on 15/01/2014 and 18/01/2014 and only cc 6901 has matching transaction for both days.
  3. GPS data showed visits to Hippokampos on 10/01/2014 and 14/01/2014 and only cc 6901 has matching transaction for both days.
gps_sf_30 <- gps_sf %>% filter(id==30)
gps_path_30 <- gps_sf_30 %>% group_by(id) %>%
  summarize(m = mean(timestamp), do_union=FALSE) %>% st_cast("LINESTRING")
gps_stop_points30 <- gps_pts %>%
  mutate(time.stop = difftime(next.start.time, end.time), 
         time.stop = as.numeric(time.stop))%>% 
  filter(time.stop < 300 & id==30) %>% 
  dplyr::select(id, start.time, start.gps)

## Plot interactive map
tmap_mode("view")
map8<-tm_shape(bgmap) +
  tm_rgb(bgmap, r=1, g=2, b=3, alpha=NA, saturation=1, 
         interpolate=TRUE, max.value=255) +
  tm_shape(gps_path_30) +
  tm_lines(col ="red") +
  tm_shape(gps_stop_points30)+
  tm_dots(col="blue", shape=30)
tmap_leaflet(map8)

Figure 14: GPS for car id 30

cc6901 <- cc %>% filter(last4ccnum==6901) %>% dplyr::select(-datetime, -date)
cc8202 <- cc %>% filter(last4ccnum==8202) %>% dplyr::select(-datetime, -date)
knitr::kable(list(cc6901,cc8202),caption="Transactions for CC 6901 & 8202")
Table 8: Transactions for CC 6901 & 8202
timestamp location price last4ccnum
1/6/2014 8:07 Brew’ve Been Served 5.66 6901
1/6/2014 14:17 Katerina’s Cafe 19.65 6901
1/6/2014 20:09 Guy’s Gyros 11.94 6901
1/7/2014 8:18 Brew’ve Been Served 47.74 6901
1/7/2014 14:09 Ouzeri Elian 59.51 6901
1/7/2014 20:20 Frydos Autosupply n’ More 312.73 6901
1/8/2014 8:03 Brew’ve Been Served 10.15 6901
1/8/2014 13:51 Abila Zacharo 30.85 6901
1/8/2014 20:57 Guy’s Gyros 10.28 6901
1/9/2014 7:58 Brew’ve Been Served 19.47 6901
1/9/2014 13:56 Guy’s Gyros 8.87 6901
1/9/2014 20:20 Frydos Autosupply n’ More 31.24 6901
1/10/2014 8:01 Brew’ve Been Served 5.31 6901
1/10/2014 13:58 Hippokampos 39.89 6901
1/10/2014 20:09 Guy’s Gyros 29.81 6901
1/11/2014 14:19 Abila Zacharo 45.20 6901
1/11/2014 20:26 Frydos Autosupply n’ More 261.00 6901
1/12/2014 13:31 Guy’s Gyros 34.74 6901
1/12/2014 16:27 Ahaggo Museum 120.20 6901
01/13/2014 08:21 Brew’ve Been Served 13.19 6901
01/13/2014 14:13 Guy’s Gyros 12.76 6901
01/13/2014 20:45 Shoppers’ Delight 144.40 6901
01/14/2014 08:13 Brew’ve Been Served 12.31 6901
01/14/2014 13:57 Hippokampos 17.18 6901
01/14/2014 20:43 Frydos Autosupply n’ More 146.74 6901
01/15/2014 08:14 Brew’ve Been Served 18.58 6901
01/15/2014 14:13 Kalami Kafenion 28.82 6901
01/16/2014 08:03 Brew’ve Been Served 16.67 6901
01/16/2014 13:55 Abila Zacharo 8.43 6901
01/16/2014 20:09 Guy’s Gyros 28.27 6901
01/17/2014 08:17 Brew’ve Been Served 5.29 6901
01/17/2014 13:55 Guy’s Gyros 32.31 6901
01/17/2014 19:46 Guy’s Gyros 16.83 6901
01/18/2014 14:17 Kalami Kafenion 53.36 6901
01/18/2014 20:07 General Grocer 108.49 6901
01/19/2014 14:20 Abila Zacharo 47.80 6901
01/19/2014 20:51 Guy’s Gyros 39.60 6901
timestamp location price last4ccnum
1/6/2014 8:17 Brew’ve Been Served 15.39 8202
1/6/2014 13:58 Hippokampos 38.25 8202
1/6/2014 20:12 Frydos Autosupply n’ More 80.85 8202
1/7/2014 7:58 Brew’ve Been Served 17.40 8202
1/7/2014 13:57 Katerina’s Cafe 37.44 8202
1/7/2014 20:13 Katerina’s Cafe 65.02 8202
1/8/2014 8:01 Brew’ve Been Served 3.92 8202
1/8/2014 13:42 Kalami Kafenion 22.49 8202
1/8/2014 20:35 Katerina’s Cafe 16.93 8202
1/9/2014 7:56 Brew’ve Been Served 98.25 8202
1/9/2014 14:09 Guy’s Gyros 27.69 8202
1/9/2014 20:22 Katerina’s Cafe 29.82 8202
1/10/2014 8:17 Brew’ve Been Served 8.47 8202
1/10/2014 14:11 Gelatogalore 32.19 8202
1/10/2014 20:02 Frydos Autosupply n’ More 43.65 8202
1/11/2014 20:06 Katerina’s Cafe 52.45 8202
1/12/2014 20:43 Frydos Autosupply n’ More 161.96 8202
01/13/2014 08:23 Brew’ve Been Served 19.89 8202
01/13/2014 14:00 Gelatogalore 36.24 8202
01/14/2014 07:53 Brew’ve Been Served 9.53 8202
01/14/2014 14:16 Hippokampos 16.73 8202
01/14/2014 20:42 Katerina’s Cafe 46.61 8202
01/15/2014 08:06 Brew’ve Been Served 3.47 8202
01/15/2014 13:46 Guy’s Gyros 16.58 8202
01/15/2014 20:26 Katerina’s Cafe 61.61 8202
01/16/2014 08:08 Brew’ve Been Served 90.33 8202
01/16/2014 13:45 Kalami Kafenion 9.27 8202
01/16/2014 20:39 Katerina’s Cafe 30.56 8202
01/17/2014 08:09 Brew’ve Been Served 9.30 8202
01/17/2014 14:00 Guy’s Gyros 15.12 8202
01/17/2014 20:19 Katerina’s Cafe 36.12 8202
01/18/2014 14:02 Kalami Kafenion 42.73 8202
01/18/2014 19:46 Katerina’s Cafe 11.19 8202
final_tagging <- final_tagging %>% ungroup() %>% 
  filter(!(last4ccnum=="8202"& id=="30")) %>% 
  mutate(id=as_factor(id)) %>% 
  left_join(car, by=c("id"="CarID")) %>% 
  mutate(name=paste(LastName,FirstName))

The tagging of all 35 car owners (excluding truck drivers) have been completed and verified.

4. Given the data sources provided, identify potential informal or unofficial relationships among GASTech personnel. Provide evidence for these relationships.

To visusalise potential relationships relationships, network analysis was used to look at the relationships. Figure 15 shows an interactive network analysis of each car ID employee and the locations that they made transactions at with their GAStech cc. From the network analysis throughout the two weeks of data, we can uncover some relationships among employees.

cc_data <- cc %>% mutate(day=lubridate::day(datetime), hour=lubridate::hour(datetime))
sources <- cc_data %>% mutate(hour=lubridate::hour(datetime)) %>% 
  distinct(last4ccnum) %>% left_join(final_tagging, by=c("last4ccnum")) %>% 
  mutate(name=paste(LastName,FirstName)) %>% 
  rename(label = name) %>% drop_na(id) %>%
  mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),"Driver",CurrentEmploymentType))
destinations <- cc_data  %>% 
  distinct(location) %>%
  rename(label = location)
cc_nodes <- full_join(sources, 
                      destinations, 
                      by = "label") %>% rename(car_id=id)
cc_nodes <- cc_nodes %>% 
  rowid_to_column("id") %>%
  mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),
                                      "Locations",CurrentEmploymentType),
         title=label) %>% 
  rename(group=CurrentEmploymentType)
edges <- cc_data %>% 
  mutate(last4ccnum = as.character(last4ccnum)) %>%  
  filter(last4ccnum %in% final_tagging$last4ccnum) %>% 
  group_by(last4ccnum, location, day, hour) %>%
  summarise(weight = n()) %>% 
  ungroup()
cc_edges <- edges %>% 
  inner_join(cc_nodes,by = c("last4ccnum")) %>% 
  rename(from = id)
cc_edges <- cc_edges %>% 
  inner_join(cc_nodes,by = c("location" = "label")) %>% 
  rename(to = id) %>% 
  dplyr::select(from, to,day, hour, weight) %>% 
  mutate(time_bin = case_when(hour>=0&hour<6~"Midnight",
                              hour>=6&hour<12~"Morning",
                              hour>=12&hour<18~"Afternoon",
                              hour>=18~"Night"),
         weekday.weekend = ifelse(day %in% c(11,12,18,19),"Weekend","Weekday"),
         day.week = case_when(day==6|day==13~"Monday",
                              day==7|day==14~"Tuesday",
                              day==8|day==15~"Wednesday",
                              day==9|day==16~"Thursday",
                              day==10|day==17~"Friday",
                              day==11|day==18~"Saturday",
                              day==12|day==19~"Sunday",))

visNetwork(cc_nodes, cc_edges, main="Network analysis by location and employee") %>% 
  visIgraphLayout(layout = "layout_on_grid") %>% 
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>% 
  visLegend()

Figure 15: Network analysis by location and employee

  1. Desafio Golf Course was visited by GAStech Executives only. Based on figure 4, we observed that there are only cc transactions made on Sunday at the location. Hence, we can infer that all five executive of GAStech might have some after working hours relationship by gathering at the Desafio Golf Course on both Sundays. They might be playing golf or a regular gathering at the location.
  2. Chostus Hotel was visited by Orilla Elsa, Tempestad Brand and Sanjorge Jr. Sten throughout the 2 weeks of data. Table 9 below shows the transaction at Chostus Hotel only and we can observe that Orilla Elsa and Tempestad Brand made transactions on 4 separate dates during lunch. The transactions were relatively expensive for a lunch meal in comparison to other food and beverage location in Abila. Alternatively, they might have paid for a hotel room during their visit to the location. Furthermore, both of them are from the same department with the same title in GAStech and there might be some relationship between them.
knitr::kable(cc %>% mutate(last4ccnum=as.character(last4ccnum)) %>% 
               left_join(final_tagging, by=c("last4ccnum")) %>% 
               filter(location=="Chostus Hotel") %>% 
               select(name, CurrentEmploymentType, CurrentEmploymentTitle,
                      location, timestamp, price),
             caption="Table of transaction at Chostus Hotel")
Table 9: Table of transaction at Chostus Hotel
name CurrentEmploymentType CurrentEmploymentTitle location timestamp price
Orilla Elsa Engineering Drill Technician Chostus Hotel 1/8/2014 12:56 107.51
Tempestad Brand Engineering Drill Technician Chostus Hotel 1/8/2014 13:19 111.89
Tempestad Brand Engineering Drill Technician Chostus Hotel 1/10/2014 13:08 133.25
Orilla Elsa Engineering Drill Technician Chostus Hotel 1/10/2014 13:11 197.41
Orilla Elsa Engineering Drill Technician Chostus Hotel 01/14/2014 13:17 109.54
Tempestad Brand Engineering Drill Technician Chostus Hotel 01/14/2014 13:21 113.08
Tempestad Brand Engineering Drill Technician Chostus Hotel 01/17/2014 13:49 114.22
Orilla Elsa Engineering Drill Technician Chostus Hotel 01/17/2014 13:54 159.62
Sanjorge Jr. Sten Executive President/CEO Chostus Hotel 01/18/2014 12:03 600.00
  1. Bean There Done That location had only transactions made by the engineering department (yellow nodes in figure 15). Bean There Done That is the furthest location from GAStech but a certain group of customer still visits and purchase from them. Visualising the GPS stationary data for the 7 customers from the engineering team in figure 16, we observe that 5 out of 7 of the customers resides in the area of Carnero Street and Parla Park whereas the remaining 2 customers, Frente Birgitta and Dedos Lidelse resides between Arkadiou Park and Sannan Park. The 2 customers residential location are at the same coordinates yet far away from Bean There Done That. However, they still patronise and purchase from there might signify some relationship between both of them.

Figure 16: Stationary GPS points of Bean There Done That customers

To investigate non-official relationships, we will focus on after working hours transactions. The network analysis was drilled down to transactions performed on Weekday Nights only and dining locations that had transactions in the afternoon or night to reduce cluttering of the network analysis. Figure 17 shows the network analysis for weekday nights transactions only. The edge line connecting the employees to location are colored by day to visualize if any group of employees visited a particular location on the same day in the night.

sources <- cc_data %>% mutate(hour=lubridate::hour(datetime)) %>% 
  distinct(last4ccnum) %>% left_join(final_tagging, by=c("last4ccnum")) %>% 
  mutate(name=paste(LastName,FirstName)) %>% 
  rename(label = name) %>% drop_na(id) %>% 
  mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),
                                      "Driver",CurrentEmploymentType))
destinations <- cc_data  %>% 
  filter(location =="Ouzeri Elian"|
         location=="Guy's Gyros"|
         location=="Katerina's Cafe"|
         location=="Hippokampos"|
         location=="Abila Zacharo"|
         location=="Gelatogalore"|
         location=="Kalami Kafenion"|
         location=="Chostus Hotel") %>% 
  distinct(location) %>%
  rename(label = location)
cc_nodes <- full_join(sources, 
                      destinations, 
                      by = "label") %>% rename(car_id=id)
cc_nodes <- cc_nodes %>% 
  rowid_to_column("id") %>%
  mutate(CurrentEmploymentType=ifelse(is.na(CurrentEmploymentType),
                                      "Locations",CurrentEmploymentType),
         title=label) %>% 
  rename(group=CurrentEmploymentType)
edges <- cc_data %>% 
  mutate(last4ccnum = as.character(last4ccnum)) %>%  
  filter(last4ccnum %in% final_tagging$last4ccnum) %>% 
  group_by(last4ccnum, location, day, hour) %>%
  summarise(weight = n()) %>% 
  ungroup()
cc_edges <- edges %>% 
  inner_join(cc_nodes,by = c("last4ccnum")) %>% 
  rename(from = id)
cc_edges <- cc_edges %>% 
  inner_join(cc_nodes,by = c("location" = "label")) %>% 
  rename(to = id) %>% 
  dplyr::select(from, to,day, hour, weight) %>% 
  mutate(time_bin = case_when(hour>=0&hour<6~"Midnight",
                              hour>=6&hour<12~"Morning",
                              hour>=12&hour<18~"Afternoon",
                              hour>=18~"Night"),
         weekday.weekend = ifelse(day %in% c(11,12,18,19),"Weekend","Weekday"),
         day.week = case_when(day==6|day==13~"Monday",
                              day==7|day==14~"Tuesday",
                              day==8|day==15~"Wednesday",
                              day==9|day==16~"Thursday",
                              day==10|day==17~"Friday",
                              day==11|day==18~"Saturday",
                              day==12|day==19~"Sunday",))
cc_edges_dn<- cc_edges %>% 
  filter(time_bin=="Night", weekday.weekend=="Weekday") %>% 
  mutate(color=rainbow(max(day))[day])
# cc_edges_dn$color <- palette(rainbow(7))[cc_edges_dn$day]
visNetwork(cc_nodes, cc_edges_dn, 
           main="Network analysis by location and employee") %>% 
  visIgraphLayout(layout = "layout_on_grid") %>% 
  visOptions(highlightNearest = TRUE, nodesIdSelection = TRUE) %>%
  visEdges(smooth=FALSE, color="color") %>% 
  visLegend()

Figure 17: Network analysis on Weekday Night

  1. Employee Baza Isak and Calixto Nils patronised Ouzeri Elian on several night at the same time. From table 10, we observe that on 08/01 and 16/1, Baza Isak and Calixto Nils transaction timing were only 1 minute apart and on 17/1, both had transactions in the evening. A probable deduction could be that they are of good friends since they are in the same department who hang out and have dinner together after working hours.
final_cc <- cc %>% mutate(left4ccnum=as.character(last4ccnum)) %>% 
  left_join(final_tagging, by="last4ccnum") %>% 
  mutate(day=lubridate::day(datetime), hour=lubridate::hour(datetime),
         time_bin = case_when(hour>=0&hour<6~"Midnight",
                              hour>=6&hour<12~"Morning",
                              hour>=12&hour<18~"Afternoon",
                              hour>=18~"Night"),
         weekday.weekend = ifelse(day %in% c(11,12,18,19),"Weekend","Weekday"),
         day.week = case_when(day==6|day==13~"Monday",
                              day==7|day==14~"Tuesday",
                              day==8|day==15~"Wednesday",
                              day==9|day==16~"Thursday",
                              day==10|day==17~"Friday",
                              day==11|day==18~"Saturday",
                              day==12|day==19~"Sunday",))
knitr::kable(final_cc %>% 
             filter(weekday.weekend=="Weekday"&time_bin=="Night") %>% 
             filter(location =="Ouzeri Elian"&(name=="Baza Isak"|name=="Calixto Nils")) %>% 
             select(location, datetime, name, price, CurrentEmploymentType,CurrentEmploymentTitle)
               , "simple",
      caption="Baza Isak and Calixto Nils transactions at Ouzeri Elian on Weekdays Nights")
Table 10: Baza Isak and Calixto Nils transactions at Ouzeri Elian on Weekdays Nights
location datetime name price CurrentEmploymentType CurrentEmploymentTitle
Ouzeri Elian 2014-01-08 21:16:00 Calixto Nils 30.81 Information Technology IT Helpdesk
Ouzeri Elian 2014-01-08 21:17:00 Baza Isak 29.85 Information Technology IT Technician
Ouzeri Elian 2014-01-09 19:42:00 Baza Isak 27.08 Information Technology IT Technician
Ouzeri Elian 2014-01-10 18:52:00 Baza Isak 19.92 Information Technology IT Technician
Ouzeri Elian 2014-01-13 19:30:00 Calixto Nils 28.75 Information Technology IT Helpdesk
Ouzeri Elian 2014-01-14 20:32:00 Baza Isak 11.86 Information Technology IT Technician
Ouzeri Elian 2014-01-15 20:29:00 Baza Isak 23.18 Information Technology IT Technician
Ouzeri Elian 2014-01-16 20:25:00 Baza Isak 23.89 Information Technology IT Technician
Ouzeri Elian 2014-01-16 20:28:00 Calixto Nils 9.91 Information Technology IT Helpdesk
Ouzeri Elian 2014-01-17 19:40:00 Baza Isak 38.60 Information Technology IT Technician
Ouzeri Elian 2014-01-17 20:28:00 Calixto Nils 35.81 Information Technology IT Helpdesk

Apart from the transactional data performed by employees, we will look into the GPS data to observe for any gathering and potential relationships. Figure 18 shows every employee car GPS stationary coordinates.

Figure 18: Stationary GPS points of all cars

  1. Hovering around the location in between Arkadiou Park and Sannan Park with coordinates (24.89, 36.06) reveals 57 GPS stationary coordinates at that location. The 57 GPS points belongs to Dedos Lidelse, Osvaldo Hennie and Frente Birgitta cars. From the GPS timestamp in table 11, Dedos Lidelse car stops at the location overnight daily. We can deduce that the location is likely the home of Dedos Lidelse. Hence, the alluvial diagram in figure 19 was used to visualise the time spent at Dedos Lidelse house for the three employees. We can observe some trends based on the time spent at the location.
Table 11: Table of transaction at Chostus Hotel
name Arrival.Time Coordinate Next_move_off_time Time_at_location
Dedos Lidelse 2014-01-06 20:18:01 POINT (24.89612 36.06343) 2014-01-07 07:01:01 643.0000
Dedos Lidelse 2014-01-07 20:20:05 POINT (24.89616 36.06332) 2014-01-08 07:12:01 651.9333
Dedos Lidelse 2014-01-08 21:22:01 POINT (24.89617 36.0634) 2014-01-09 07:17:01 595.0000
Dedos Lidelse 2014-01-09 17:39:39 POINT (24.89615 36.06342) 2014-01-10 07:08:01 808.3667
Dedos Lidelse 2014-01-10 23:39:30 POINT (24.89611 36.0634) 2014-01-11 18:51:01 1151.5167
Dedos Lidelse 2014-01-11 20:57:00 POINT (24.89635 36.06331) 2014-01-12 12:30:01 933.0167
Dedos Lidelse 2014-01-12 20:56:03 POINT (24.89614 36.06337) 2014-01-13 07:05:01 608.9667
Dedos Lidelse 2014-01-13 21:09:01 POINT (24.89612 36.06339) 2014-01-14 07:43:01 634.0000
Dedos Lidelse 2014-01-14 20:36:01 POINT (24.8961 36.06338) 2014-01-15 07:41:01 665.0000
Dedos Lidelse 2014-01-15 20:40:08 POINT (24.8961 36.06341) 2014-01-16 07:13:01 632.8833
Dedos Lidelse 2014-01-16 19:49:08 POINT (24.8961 36.06338) 2014-01-17 07:28:01 698.8833
Dedos Lidelse 2014-01-17 17:45:39 POINT (24.89608 36.06337) 2014-01-18 12:38:01 1132.3667
Dedos Lidelse 2014-01-18 19:40:05 POINT (24.89612 36.0634) 2014-01-19 18:25:01 1364.9333

1.1 Frente Birgitta and Osvaldo Hennie often arrive at the location around 1700 hrs and leave at 1900 hrs on weekdays only.

1.2 Frente Birgitta would often drop by the location twice a days. On those days, Frente Birgitta would arrive around 1700hrs and leave at 1900hrs, similar like above and return subsequently to the location after 2000hrs and leave the following morning.

1.3 Osvaldo Hennie only stay overnight at that location 5 times over this period.

Probable deduction is that they were having dinner together at Dedos Lidelse house. An unofficial relationship might exist between Frente Birgitta and Dedos Lidelse. Furthermore, both employees are from the engineering department which might further support the deduction.

Alluvial Diagram of time spent at Dedos Lidelse house

Figure 19: Alluvial Diagram of time spent at Dedos Lidelse house

5. Do you see evidence of suspicious activity? Identify 1- 10 locations where you believe the suspicious activity is occurring, and why.

Employees car GPS were analysed to investigate for unusual driving patterns. The GPS data was manipulated to derive the two stationary coordinates for each car trip to determine the start and end coordinates. The distance between the two coordinates was tabulated to determine the displacement between the two coordinates.

Figure 20 shows the scatter plot of the cars plotted with distance traveled against driving time with the line of best fit to show the average speed. The plots was split by time period and only showcase weekday data. The points in red are the outliers in each time period in the 1% quantile range. Although the distance is not the actual distance traveled by the car, it will be a good proxy to determine the average speed require to get from one location to another location.

Figure 20: Scatter plot of car driving time against distance travelled on weekday

  1. From the plot, we observed that there are not many cars that traveled during midnight (0000 to 0600 hrs) and the average speed in the morning was significantly slower compared to the afternoon or night based on the gradient of the trend line.
  2. In the subplot for the Afternoon, Mies Minke took 12 minutes to travel 0.4 metres. The extreme outlier from the trend line might suggest unusual driving pattern for investigation. Coupled with the suspicious cc transaction from earlier section, Mine Minke has several suspicious points throughout the investigation process.
  3. Among the 24 outliers, 5 outliers belonged to Mies Minke, 3 outliers from Campo-Corrente Ada and 2 outliers from Resumir Felix and Vasco-Pais Willem. The remaining outliers each belonged to different indivduals. From the break down in table 12, we observe that they are mainly from the Executive team or the security team.
Table 12: Table of unusual vehicle movement
id LastName FirstName CurrentEmploymentType n
24 Mies Minke Security 5
10 Campo-Corrente Ada Executive 3
30 Resumir Felix Security 2
35 Vasco-Pais Willem Executive 2

Figure 21 shows the map with GPS lines of Mies Minke and the car stationary coordinates are the blue dots throughout the 14 days. The stationary GPS coordinates of the other employees car were also added as markers on the map. From the map visualisation, we observe that Mies Minke car stop at some unusual location, which were neither his house nor point of interests locations.

1.1 Mies Minke car stopped on the South East of Abila Map near the text: To Port of Abila in the tourist map on 07/01/2014 from 1113 to 1231 hours. Apart from Mies Minke, only Osvaldo Hennie, Ferro Inga, Bodrogi Loreto ever visited the location.

1.2 Mies Minke car stopped somewhere south west of Bean There Done That on 08/01/2014 from 1132 to 1209 hours. Apart from Mies Minke car, only Osvaldo Hennie, Bodrogi Loreto and Ferro Inga car ever visited the location. Bodrogi Loreto car also visited on the same day, 08/01/2014 from 1129 to 1140 hours. The other car GPS reveals that the location was visited on 09/01/2014 and 17/01/2014.

1.3 Mies Minke car stopped near Pilau Street twice, on 10/01/2014 and 16/01/2014. Apart from Mies Minke, only Bodrogi Loreto, Ferro Inga and Osvaldo Hennie car stop at that particular location and Osvaldo Hennie car stop at the location on the same day, 16/01/2014 from 1122 to 1210 hours, which overlapped with Mies Minke car.

1.4 Mies Minke car stopped in the north between Coffee Chameleon and Guy’s Gyros on 09/01/2014 and 14/01/2014. Apart from Mies Minke, only Ferro Inga, Bodrogi Loreto and Osvaldo Hennie car visited the location from 13/01/2014 to 15/01/2014.

1.5 All four locations had the same group of 4 employees car stopping at those locations. Those locations were neither point of interests nor popular locations that other employees would visit. Furthermore, all four employees belongs to Security department and meeting at such unusual locations during weekday lunch time might suggest possible suspicious activity among them.

2.1 Mies Minke car stopped once at SVP/COO Strum Orhan house on 08/01/2014 from 2306 to 09/01/2014 0330 hours. The time period of Mies Minke car at the location is highly suspicious. Furthermore, Bodrogi Loreto car arrives at 0332 hours on 09/01/2014 and left the location at 0723 hrs in the morning.

2.2. Mies Minke car stopped once at SVP/CFO Barranco Ingrid house on the 14/01/2014 from 0331 to 0747 hours. Similarly, Osvaldo Hennie car also stopped at the location earlier from 13/01/2014 from 2308 to 14/01/2014 0330 hrs.

2.3 The group of Security employees that took turn to be at either Executive houses were the same group of suspicious personnel in part 1 of our observation.

3.1 In the earlier sections, we deduce that Mies Minke (car id 24) credit card number is 4434. However, his car GPS data supports the fact that he used credit card number 9951 to perform transactions on 13/01/2014, including the high outlier transaction amount of 10,000 dollars at Frydos Autosupply n’ More.

The four employees in particular Mies Minke are highly suspicious because of their unusual car GPS movement throughout the two weeks data.

Figure 21: GPS data for Mies Minke

From the map in figure 21, we discovered only five cars ever visited Kronos Capitol. From table 13. we can observe that 4 out of the 5 visits occurred on 18/01/2014 and 3 cars were from the Security department with only Herrero Kanon was from the Engineering department. Furthermore, Herrero Kanon car was stationary at that location from 18/01/2014 12:47:34 till 19/01/2014 12:38:01 where it drove off. The car being stationary at Kronos Capitol overnight was quite suspicious considering that the date was near the disappearance period. A possible deduction could it that Herrero Kanon took either of the 3 other Security cars and left Kronos Capitol before returning the next day to retrieve his vehicle. Another probable deduction could be Herrero Kanon was engaged in some activities during that period inside Kronos Capitol.

Table 13: Car stop at Kronos Capitol
name CurrentEmploymentType CurrentEmploymentTitle Arrival.Time Next_move_off_time Time_at_location
Vasco-Pais Willem Executive Environmental Safety Advisor 2014-01-11 14:01:07 2014-01-11 17:24:01 202.90
Nubarron Adra Security Badging Office 2014-01-18 10:12:43 2014-01-18 13:22:01 189.30
Herrero Kanon Engineering Geologist 2014-01-18 12:47:34 2014-01-19 12:38:01 1430.45
Bodrogi Loreto Security Site Control 2014-01-18 13:14:04 2014-01-18 15:13:01 118.95
Vann Edvard Security Perimeter Control 2014-01-18 13:23:46 2014-01-18 18:21:01 297.25

In conclusion, the employees in the Security department are very suspicious based on the GPS and credit card transactions data presented. We would recommend to perform further investigation on them to determine if they were linked to the dispparance in Abila town.

Andrienko, Natalia, Gennady Andrienko, and Georg Fuchs. 2014. “Analysis in Geographic and Semantic Spaces.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/Fraunhofer%20IAIS%20and%20City%20University%20London/.
Chua, Alvin, Ryo Sakai, Jan Aerts, and Andrew Vande Moere. 2014. “KUL-Chua-Mc2.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/KU%20Leuven/.
Croceri, Fernando, and Pablo Guzzi. 2014. “UWB-Smith-Mc2.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/University%20of%20Buenos%20Aires%20-%20Croceri/.
Flores, Jorge Luis Alcoser, Fredy Hernan Gomez Lopez, and Miguel Francisco Jarma Forero. 2014. “UBA-Alcoser-Mc2.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/University%20of%20Buenos%20Aires%20-%20Alcoser/.
Guan, YiFei. 2016. “VAST Challenge 2017:mystery at the Wildlife Preserve.” 2016. https://wiki.smu.edu.sg/1617t3isss608g1/ISSS608_2016-17_T3_Assign_GUAN_YIFEI_Visualization.
Ong, Han Ying. 2016. 2016. https://wiki.smu.edu.sg/1617t1ISSS608g1/ISSS608_2016-17_T1_Assign3_Ong_Han_Ying.
Sahaf, Zahra, Haleh Alemasoom, Rahul Kamal Bhaskar, Julia Parades, Zahra Shakeri, Craig Anslow, Mario Costa Sousa, Faramarz Samavati, and Frank Maurer. 2014. “VAST Challenge 2014.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/University%20of%20Calgary/.
Singhal, Manik, Prakash Lekkala, Shiva Shankar M R, and Parameshwaran Iyer. 2014. “RBEI-IYER-Mc2.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/Central%20South%20University/.
“VAST Challenge 2014:MC2 - Patterns of Life Analysis.” 2014. 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/.
Villordo, Sergio Manuel, Luciano Cabrera Hee Joon Park, Juan M. Bodenheimer, Juan Pablo Ferrandez, and Antonio Tralice. 2014. “UBA - Chanta Miners - Mc2.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/University%20of%20Buenos%20Aires%20-%20Tralice/.
Zhao, Ying, Yanni Peng, Wei Huang, Yong Li, Fangfang Zhou, Zhifang Liao, and Kang Zhang. 2014. “CSU-Zhao-Mc2.” 2014. http://visualdata.wustl.edu/varepository/VAST%20Challenge%202014/challenges/MC2%20-%20Patterns%20of%20Life%20Analysis/entries/Central%20South%20University/.

References

Citation

For attribution, please cite this work as

Lim (2021, July 17). Yong Kai: Assignment: VAST Mini-Challenge 2. Retrieved from https://limyongkai.netlify.app/posts/2021-07-10-vastmc2/

BibTeX citation

@misc{lim2021assignment:,
  author = {Lim, Yong Kai},
  title = {Yong Kai: Assignment: VAST Mini-Challenge 2},
  url = {https://limyongkai.netlify.app/posts/2021-07-10-vastmc2/},
  year = {2021}
}